Correlations in connected random graphs 
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We study the properties of the giant connected component in random graphs with arbitrary degree 
distribution. We concentrate on the degree-degree correlations. We show that the adjoining nodes 
in the giant connected component are correlated and derive analytic formulas for the joint nearest- 
neighbor degree probability distribution. Using those results we describe correlations in maximal 
entropy connected random graphs. We show that connected graphs are disassortative and that 
correlations are strongly related to the presence of one-degree nodes (leaves). We propose an efficient 
algorithm for generating connected random graphs. We illustrate our results with several examples. 
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I. INTRODUCTION 

In the last decade or so, there has been a great in- 
crease of interest in the theory of random graphs and 
networks (in the following we will use those two terms 
interchangeably). While in principle this is a branch of 
mathematics, much of this effort was fueled by the avail- 
ability of "experimental" data on real graphs (see [1] for 
review). These data are compared to the predictions of 
various random graphs models. Probably the best known 
and simplest example of such reference models is the en- 
semble of all labeled graphs with V vertices and L links 
(without multiple- and self- links), chosen with uniform 
probability. We will call this model Erdos-Renyi (ER) 
graphs after the authors, who were the first to introduce 
and study them 

The ER ensemble is the simplest example of the so- 
called "maximally random" graphs. Intuitively those are 
the ensembles where the distributions of vertices and 
links joining them are "as random as possible" for a given 
set of constraints. In the case of ER graphs the only con- 
straints are the fixed number of links and vertices. The 
"maximal randomness" can be formalized using the no- 
tion of entropy (see next section) . The maximally random 
ensembles serve as null hypothesis. For example, it was 
the deviation of data collected on the World Wide Web 
(WWW) graph from the predictions of the ER model 
that triggered the interest in random networks Q, be- 
cause it implied that those graphs were not created just 
by joining vertices at random, but required the existence 
of another mechanism j3| . 

A popular generalization of the ER ensemble are 
graphs with a given degree distribution (degree of a node 
is the number of links attached to it) (sl-flOj. One feature 
of those ensembles is the absence of correlations between 
neighboring nodes' degrees, at least for degree distribu- 
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tions without heavy tails (see the discussion in Sec. lIV C|) . 
The object of our study was to find what happens when 
we constrain to connected graphs only. A simple argu- 
ment indicated that correlations would appear: a neigh- 
bor of a node with degree one (leaf) must have its degree 
greater than 1; otherwise, they would form a separate 
connected component. Similarly, all neighbors of a node 
cannot have their degree equal to 1, as such a "hedge- 
hog" would also form a separate connected component 
[ill [T^. This obviously leads to correlations. It is not 
clear, however, how strong they are and if they survive 
the large- limit. We have already studied those corre- 
lations numerically in Ref. [l2| and found that they also 
appear in large graphs. In this paper we derive the an- 
alytic formulas describing them. We also found a strong 
indication that the described mechanism is the only one 
responsible for the correlation in maximally random con- 
nected graphs: when we forbid vertices with degree 1 
correlations disappear. 

Connectivity is a nonlocal constraint hard to deal 
with. To study the properties of connected graphs we 
use another feature of maximally random graphs with 
a given degree distribution: the appearance of a con- 
nected component that includes a finite fraction of all 
the vertices (and links). From the properties of this gi- 
ant connected component we can infer the properties of 
connected graphs. 

The paper is organized as follows: Section |lT] intro- 
duces some basic definitions concerning random graphs. 
In Sec, mil we present the method of generating functions 
used to study the properties of the giant connected com- 
ponent in random graphs with arbitrary degree distribu- 
tion [6]. Then we calculate degree-degree correlations in 
the giant component. Section IIVI contains some exam- 
ples where we compare our predictions with the results 
of Monte Carlo (MC) simulations. Finally, we show in 
Sec. |V] how to relate connected random graphs to giant 
connected components in other ensembles. In Sec. IVII we 
address the situation when correlations in random graphs 
are suppressed by the absence of vertices with degree one 
(leaves). The paper is summarized in Sec. IVIII 
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II. RANDOM GRAPHS 

A. Average degree 

Formally we consider random graphs as an ensemble of 
graphs with probability P{G) assigned to every graph 
G E Q. Using this definition we introduce the entropy of 
the ensemble: 



(1) 



Gee 



The maximally random ensembles described in the pre- 
vious section are those which for given constraints have 
maximal entropy. 

Denoting by 0{G) some property of graph G we can 
calculate its average over the whole ensemble: 



{0)a = J2 0{G)P{G). 



(2) 



Gee 



The most widely studied example is the probability dis- 
tribution of node degrees: 



_/MG)\ 



(3) 



where nk{G) is the number of vertices with degree k and 
V{G) is the total number of vertices in graph G (in the 
following we will often omit the argument G) . The mean 
of this distribution is the "link density," 



(fc) = ^ kpk 



2L{G)\ 
ViG) Ig 



(4) 



because kuk = 2L{G); by L{G), we denote the num- 
ber of links in graph G. 

However, what is frequently observed is not an average 
([2), but the properties of a single graph (e.g., WWW). 
That is why we are actually interested in the probability 
that our model will produce a graph with those proper- 
ties. It is described by the distribution 



PiO)^J2^(0-OiG))PiG). 



(5) 



Gee 



In many cases this distribution is sufficiently well char- 
acterized by its mean ^ with relative fluctuations dis- 
appearing in the large-V limit. In this situation we will 
say the O is self-averaging. In such a case one can infer 
the properties of the whole ensemble from the properties 
of just one large graph. We want to emphasize, however, 
that this is only an assumption that has to be checked 
for each particular model (see [l^ for a discussion of self- 
averaging in real graphs). 

In Appendix |^ we show for illustration a definition 
of a non-self-averaging ensemble. Although this is an 
artificial example, let it serve as a warning. In this paper 
we assume that our models are self-averaging without any 
further formal proofs. 



We end with the following comment: as in the self- 
averaging ensemble fluctuations do not matter, in the 
large- volume limit we have 



Pk 



/nk{G)\ 
\ V{G) 1^ 



{V{G))c 



(6) 



We will use this kind of approximations in the following 
sections. 



B. Correlations 

The distribution pk does not give any information 
about the correlations between vertices. An obvious gen- 
eralization is the joint distribution r which describes 
the probability that a pair of nearest neighbors (NNs) 
has degrees q and r (we assume that we pick a pair of 
NNs with uniform probability): 



Pq,r 



'2L 



(7) 



where the number of links with their start point 

having degree q and endpoint having degree r. Note that 
we treat each undirected link as two directed links. On 
an undirected graph, 

nq,r = nr^q, rig^r = 2i and ''^^Uq^r — qnq. (8) 

g,r r 

If vertex degrees are independent, the probability ([7]) 
should factorize: 



Pq,r=PqPr: Pq = ^Pq..r, 

T 

leading to the relation 

ng,r \ I nq \ I nr\ 



2L 



(9) 



(10) 



One should, however, keep in mind that this defines the 
absence of correlations in the ensemble of graphs. A more 
appropriate question could be, are the vertices on indi- 
vidual graphs uncorrelated (see previous section)? The 
condition for absence of correlations between vertices in 
each individual graph G is 



2L{G) 



(G) _ _ nq{G) nr{G) 
"""2L{G) 2L{G) 



qr 



or, after averaging. 



Tlq j' \ / Tlq Tl^ 

'21/ ^'^^\2L2L 



(11) 



(12) 



As already pointed out, for a large class of ensembles con- 
ditions (|10p and ([T^ are equivalent in the large-volume 
limit. However, it is easy to check that for the non-self- 
averaging ensemble in Appendix [A] vertices on each indi- 
vidual graph are uncorrelated according to the condition 
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(fT^ . but correlated according to ([TO)) . Again, we leave 
this as a warning and proceed further with the assump- 
tion that our models are self-averaging and that those 
two conditions are equivalent. 

In practice, checking the condition ^ is difficult as 
it entails measuring a two dimensional distribution with 
goo d accuracy. Therefore we introduce another quantity 

M 



fc(fc) = E 



kni. 



(13) 



It describes the average degree of nearest neighbors of 
a vertex with degree k. Obviously k{k) is defined for a 
given k only if nk > 0. k{k) can be interpreted as the first 
moment of the conditional probability: 



p{q\k) = 



Pq,k 

Pk 



Assuming self-averaging. 



k{k) « ^qp{q\k). 



(14) 



(15) 



If the degrees are independent, k{k) should not depend 
on k and ((T2]) implies 



kik) ^ J2 



2 /^\ _ 

^ \2L/ ^ {k) ' 



(16) 



When k{k) grows with k the graph is called assortative 
and when it shrinks disassortative. 



III. CONNECTED COMPONENTS 

In general, maximally random graphs with a given de- 
gree distribution do not need to be connected. However, 
if 



E kik - 2)pk > 



(17) 



a node from this component, where s is the size of the 
component. So for finite s this becomes negligible in the 
large-]/ limit. 

Now let us pick a link from the graph at random. It 
belongs to some connected component. We will call Pi (s) 
the probability that cutting this link will split the com- 
ponent into two parts, one of them finite and having size 
s. Stated differently, -Pi(s) is the probability that a ran- 
domly chosen link will lead into a finite part of size s. 
By the argument above this finite "half" will be a tree. 
Because of that, one can write down the equation for the 
generating function Hi{x) = Pi{s)x^ 



(18) 



where 



Gi(x) = ||^ = iGU^), Go(x)=Ep,a;'=. (19) 



fc=0 



We denote by u the value of i?i(l): 

u = i/i(l) =EPi(5 



(20) 



When there is no giant component in the graph, all con- 
nected components are finite and are trees. This means 
that cutting each link will result in two finite parts; thus, 
u = l However, when the giant component appears, then 
there is a nonzero probability that the chosen link will 
belong to this component and either cutting it will split 
the component into two infinite parts, or will not split it 
at all. As this probability is missing from Pi{s) the sum 
(1^0)1 will be smaller the one. u is to be interpreted as the 
probability that a randomly chosen link is connected to a 
finite part on at least one side of the graph fiol. It follows 
that V? is the probability that a random link belongs to 
a finite component of arbitrary size. 

That can be derived in a more explicit way. Let us 
denote by Pi,i(s) the probability that a randomly chosen 
link belongs to a component of size s. Then, 



(which translates into z > 1 in the case of ER graphs), one 
of the connected components (called the giant connected 
component) will gather a finite fraction of all links and 
vertices Ja|. This is a phenomenon akin to percolation. 
In Ref. [6] the size of the giant component and the size 
distribution of finite components were calculated. The 
degree distribution in the giant component p^^^ was cal- 
culated in Ref. Here we generalize those results and 

calculate the two-point distributions p^j')- and k^^^ (fc) for 
the giant component. 

We will use the method of generating functions intro- 
duced in @. The crucial observation is that the finite 
connected components are essentially trees. That is be- 
cause a link emerging from one of the vertices in the com- 
ponent has the probability oc s/V of connecting back to 



A,i(s) 



t=o 



Pi{t)Pi[s^t). 



(21) 



It is a convolution of the probability distribution Pi{s) 
with itself, so its generating function is just Hl[x). Then 
= Hf{l) = ^gPi,i{s) is the probability that a link 
belongs to a finite connected component of arbitrary size 
and 1 — is the probability that it is inside the giant 
component. 

Finally, if we denote by Po{s) the probability that a 
randomly chosen vertex belongs to a finite component of 
size s, we can obtain its generating function Hq{x) from 
Hi{x) Q: 



Hoix) 



'Po{s)x'^xGo{Hiix)). 



(22) 
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By the same arguments as above, 

h = Hoil) = Go{u) 



(23) 



is the probabiUty that a randomly chosen vertex belongs 
to a finite connected component and 1 — /i is the proba- 
bility that it belongs to the giant component. 

It follows from ^TE\\ and ([^0]) that u is the solution of 
the equation 



Gi{u). 



(24) 



From the definition dTUl) it is easy to note that u — 1 
is always a solution, but when condition (IT7|) is fulfilled 
the above equation has a solution smaller than 1 as well 
Q. As argued, this signals the appearance of a giant 
component. 



A. Average degree 

Using the results of the previous section it is easy to 
derive formulas for the average degree in the giant com- 
ponent z^^^ and in the rest of the graph z'^^">: 



Af) - 



2L(9) 
'via) 

2£(/) \ 
Vif) / 



1 



u 



1 - h 

2 



U 



(25a) 
(25b) 



As we have already pointed out, the giant connected 
component is not a tree. The number of independent 
loops that it contains equals 



V[-{l-u^)-l 



(26) 



and as all the remaining connected components are trees, 
this is also the number of loops in the whole graph. 

We can also easily calculate the number of finite con- 
nected components ricn knowing that they form a forest. 
The number of links in the forest is L'-^^ = V''^^ — ricn 
which gives 



{ncn) = (h-u'^^V. 



(27) 



From that we can derive the formula for the average 
size of the finite connected component: 



(28) 



B. Degree distribution 



In this section we will calculate the degree distribution 
( f) • 

in the nongiant component part of the graph. From 
the relation 



Pk = (1 



if) 



(29) 



we automatically get the distribution pf^ in the giant 
component. This has been already done in Q, but we 
find it instructive to use the same method of generating 
functions as described in Sec. IIIII The idea is to apply it 
only to the graph with the giant component excluded — 
i.e., to the finite connected components. We will use a 
tilde to denote the generating functions of the sought 
probability: 



Go(.t) = 



k=0 



(30) 



Using the argument from Ref. [6] we obtain the same 
equations 



Hoix) 



xGi{Hi{x)), 
xGoiHiix)), 



(31a) 
(31b) 



for the generating functions of the probabilities Pi (s) and 
Po{s). Here Pq{s) is the probability that a vertex belongs 
to a finite component of size s provided that it belongs 
to a finite component and Pi (s) is the probability that a 
link leads into a finite component of size s provided that 
it leads into a finite component. From this we can write 
the relations 



Po{s) ^ hPois), Pi{s) ^uPi{s), 
which leads to 



Hoix) 
Hi{x) 



hHoix), 
uHi{x). 



(32) 

(33a) 
(33b) 



To solve Eqs. ([301), (ED), and dSS]) for p'j/^ we make an 
ansatz 



Then, 



Go{x) 



pi'' 



Go{xa) 



Pka 
G{a) 



Giix) 



Gi{xa) 



Go (a) ' Gi{a) 
so that Eq. (|31ap can be rewritten as 



(34) 



(35) 



aHi(x) 



-xGi{Hi{x)a). 



Gi{a) 

Comparing with (fT5)) we see that it will be fulfilled if 



(36) 



aHi{x) = Hi 



Giia)' 



Inserting this into (j33bp we get 



aHi{x) = uHi 



(37) 



(38) 



Gi{a) y 

because of Eq. (|24l) , which can be solved by putting a = u 
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FIG. 1; Average degree z (dashed line), average degree Zg 
of the connected component (upper sohd hne), and average 
degree of the rest Zf (lower solid line) as a function of z for 
ER graphs. Circles mark the results of MC simulations. 
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FIG. 2: Degree distribution for ER graphs with z = 2. Circles 
mark the results of MC simulations for the giant component 
and diamonds for the full graph. Solid lines denote analytical 
solutions. 



N ow we must check Eq. (|55al) . Using Eqs. (gl]), (|31bp . 
and (j33bp we get 



hHo{x) = hxGo{Hi{x)) = hx 



GoiuHiix)) 



Goiu) 



So filially, 



, Go(Hi(x)) 
xh = Ho{x) 



~ h 



(39) 



(40) 



From that and relation (|^^ we get the formula for the 
degree distribution in the giant component: 



(3) (l-^'') 
In the limit u — > 1 and /i — > 1 this reduces to 



ia) 

Pi = -Pk- 



(41) 



(42) 



In this limit the connected giant cluster is a tree. Indeed, 
one can check that 



We have already assumed that vertex degrees are un- 
correlated; we further assume that this is also true for 
the finite connected components (nongiant) part of the 
graph. Assuming self-averaging and using Eq. ([TO)) for 



( f) 

Uq^r and nq\r we obtain 



p(9) 



qpqrpr 



and 



z 1 — 



(45) 



(46) 



In the derivation we have used the relation (5) = -j^, 
which should be valid for self-averaging quantities in the 
large-V^ limit. Comparing this with formulas (jlOp and 
(fTB)) we note that the correlations disappear in the limit 
M — > 0. In the tree limit u^l the formulas above take 
the form 



lim ^{q + r-2) 



1 qpqrpr 



(47) 



(43) 



To see this we must first note that Eq. has always the 
solution u — 1. It becomes the only one when G'i{l) — 1, 
which is equivalent to the condition 



and 



Ym,y'\k)^\{{k-2){e) + {e)). (48) 



IV. EXAMPLES 



C. Correlations 

To calculate p[^} we use the relation 

nqAG)=n[j%G)+n[f}{G). (44) 



While deriving our formulas we have made several as- 
sumptions: (i) the vertex orders are uncorrelated, (ii) 
the measured quantities are self-averaging, and of course 
(iii) all the derivations are only valid in the large- limit. 
To check to what extent those assumptions are satisfied 
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2 3 4 5 



K 



FIG. 3: k{k) for ER graphs with 2 = 2. Circles mark the 
results of MC simulations for the giant component and dia- 
monds for the full graph; solid lines stand for analytical solu- 
tions. 



and, more importantly, to check the magnitude of the fi- 
nite size effects, we have compared our predictions to the 
results of MC simulations of moderate-sized graphs (5000 
vertices). To simulate ER graphs we used a straightfor- 
ward algorithm which coimects vertices at random. To 
generate maximally random graphs with a given distri- 
bution we used the method described in Refs. d, [HI and 
implemented in Ref. This method consists of gener- 
ating graphs with suitably chosen one-point weights using 
a Metropolis-type algorithm. 



A. Erdos-Renyi graphs 



For ER graphs the distribution pk is Poissonian, 

k\ 



Pk = e ^ 4t and 



Goix) ^ Giix) ^ e 



(49) 



It follows that Hi{x) = Ho{x) = H{x), so h — u with h 
being the closest to one (from below) positive solution of 
the equation 



h 



Mh-l) 



(50) 



The results for z'^^'* and z^^^^ are shown in Fig. [TJ They 
are compared with the results of the MC simulations of 
ER graphs. The agreement is perfect, and there are no 
visible finite-size effects (error bars are smaller than the 
size of the points). The degree distribution can be now 
easily obtained from (|^T|) . The results are presented in 
Fig. [21 Again, the agreement is very good without any 
noticeable finite-size effects. 

In this case it may be instructive to derive those results 
in a simpler way: when we omit the giant component 
from our considerations we are left with a graph with 
hN vertices and h^L links on average. As there are no 



FIG. 4: Average degree z (dashed line), average degree Zg 
of the connected component (upper solid line), and average 
degree of the rest Zf (lower solid line) as a function of k for 
graphs with exponential degree distribution. Circles mark the 
results of MC simulations. 

further restrictions, we can assume that this graph is an 
Erdos-Renyi graph as well. This means that its degree 
distribution is again Poissonian with mean z^^^: 



pi'' = 



kl ^ 



z'^h'' 



k\ 



(51) 



From the relation h = u we obtain formula (|40l) . Finally, 
for fc(fc) we get 



zh+1, 
h'' 

z + l 



(52) 



The results are presented in Fig. |3l One can see clearly 
the appearance of correlations in the giant connected 
component as advocated in the introduction. The agree- 
ment with the predicted values is again very good. 



B. Exponential degree distribution 

As the second example we take graphs with exponen- 
tial degree distribution 



Pfe - (1 - e-l/'=)e~'=/^ 
The average degree in this case is 
e-^ 1 



l-e- 



K--, K>1, 



(53) 



(54) 



and 



Go(x) 



1 — xe~ 



^, G,{x)=Gt{x). (55) 



This implies u = h^. The giant component appears for 
K > l/ln3 w 0.91. The results for z^s) and z'^^'' are 
presented in Fig. |4l As in the previous example, there 
are no visible deviations from the theoretical predictions. 
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FIG. 5: Degree distribution for graphs with exponential de- 
gree distribution with k — 1.5. Circles mark the results of 
MC simulations for the giant component and diamonds for 
the full graph; squares stand for the special case of connected 
graphs without leaves described in Sec. lVIl Solid lines denote 
analytical solutions. 
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FIG. 6: k{k) for graphs with exponential degree distribution 
with K= 1.5. Circles mark the results of MC simulations for 
the giant component and diamonds for the full graph; squares 
stand for the special case of connected graphs without leaves 
described in Sec. IVII Solid lines denote analytical solutions. 



In Figs. [5] and |6] results for p)^ and k (fc) are pre- 
sented for K = 1.5. We observe the same kind of cor- 
relations in the giant component as in the case of ER 
graphs. 



C. Scale- free graphs 

Probably the most interesting case are scale-free 
graphs with distribution ~ k~^ . While studying them 
we have to consider two scenarios 2 < /3 < 3 and /3 > 3. 
In the first case we expect correlations between node de- 



FIG. 7: Degree distribution for scale-free graphs with /3 = 
3.25. Circles mark the results for the giant component and 
diamonds for the full graph; squares stand for the special case 
of connected graphs without leaves described in Sec. lVIl Solid 
lines denote analytical solutions. 

grees, as pointed out in Refs. (j. [l7l-[ll| . This invalidates 
both the derivation of Eqs. ([T5)) and (^5)) . Additionally 
the quantity {k'^} diverges and so fc(fc) is not defined. 
Because our aim was to investigate the correlations ap- 
pearing solely as an effect of the connectedness of graphs, 
we have decided not to study the /3 < 3 case in this paper. 
This is, however, an interesting issue and merits further 
investigation. One line of pursuit is to use the algorithm 
proposed in jl9i] to generate uncorrelated graphs with 
heavy tails. Then one should obtain predictions at least 
for the joint probability pq^r which does not contain any 
divergences. One could also use the F-dependent "cut- 
off" distribution as proposed in [l^ instead of the "full" 
distribution pk ^ . This would yield the V depend- 
ing results, but may not be feasible analytically. In the 
case of /3 < 2 already the first moment of the distribution 
Pk is not defined and the generating function approach 
fails completely. 

When /3>3 the (/c^) is finite and there are no correla- 
tions, at least in the infinite-size limit (Tsl. [Toj. However, 
for finite V we expect strong finite-size effects for /3 close 
to 3. To see this let us estimate the asymptotic behavior 
of (P): 

poo 

(A:2)«^fcV- / k^pk^{k')^~cV-^. (56) 

In the above we have assumed the natural cutoff kc{V) ~ 
For /3 close to 3, this converges very 
slowly. To observe those effects we have simulated our 
system at /3 = 13/4, when (fc^) approaches its asymptotic 
value as V~^^^ . The results of our simulations of graphs 
with 5000 vertices are presented in Figs. [7] and [S] As ex- 
pected the data for pk and p[f ^ distributions show strong 
cutoff effects around fc = 40, but for smaller values of k 
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FIG. 8: fc(fc) for scale-free graphs with /? = 3.25. Circles 
mark the results for the giant component and diamonds for 
the full graph; squares stand for the special case of connected 
graphs without leaves described in Sec. lVIl Solid lines denote 
analytical solutions. 



the agreement with theoretical predictions is rather good. 
Looking at the results for k{k) we notice two things: (i) 
Data for the full graph show a deviation from a straight 
line, indicating the presence of some correlations due to 
heavy tails, (ii) Data for the giant connected component 
show a very strong effect of correlations. The agreement 
with theoretical values is very poor, so we have not in- 
cluded them in the picture. This is due to the described 
cutoff effect on (fc^). We can obtain a better agreement 
if we use in Eq. the actual value of (fc^) measured 
in simulations instead of its infinite- volume limit. 



FIG. 9: Degree distribution Pfe(fc) in connected ER graphs 
with various average degrees. Points mark the results of MC 
simulations, while solid lines denote analytical solutions. The 
size of each graph is 5000 vertices. 




V. CONNECTED GRAPHS 



FIG. 10: k{k) for connected ER graphs with various average 
degrees. Points mark the results of MC simulations, while 
solid lines denote analytical solutions. The size of each graph 
is 5000 vertices. 



Finally, we would like to calculate the properties of the 
maximally random connected graphs. To this end we as- 
sume that the ensemble of giant connected components 
of the maximal entropy graphs with distribution pk is a 
maximal entropy ensemble of connected graphs with dis- 
tribution p^i^^ (we neglect the fluctuations in the number 
of vertices and links of the giant component). This is a 
plausible assumption as we do not put any additional con- 
straints except connectivity. In Appendix [B] we provide 
a more detailed argumentation. With this assumption 
the properties of the maximal entropy connected ran- 
dom graphs with distribution p^,^-* and/or average degree 
z^^^ are the same as that of the maximal entropy ran- 
dom graphs with distribution pk and/or average degree 
z given by Eqs. (gT]) and (|25ap . 



A. Connected ER graphs 

By connected ER graphs we mean maximal entropy 
connected graphs with a given average degree z*^^). Ac- 
cording to the arguments from the previous section this 
ensemble corresponds to the ensemble of giant compo- 
nents in ER graphs with average degree z related by 
Eq. (|25ap . For a given z'f' we solve this equation for z 
(numerically) and use formulas (|4T|) and (l52|) for degree 
distribution and for k{k) respectively. The results are 
presented in Figs. [S] and [TU] and compared with the MC 
data for connected graphs taken from [l^- The agree- 
ment is very good which confirms the validity of the as- 
sumption made in the previous section. 
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B. Connected random graphs with arbitrary 
degree distribution 

To calculate the properties of connected random 
graphs with arbitrary degree distribution we need to in- 
vert Eq. (|¥T|) . This can be done by rewriting it as 



Pfe = (1 - h) 



1 — M*^ ' 

where u satisfies Eq. (P^ : 

Y^oo (g) fcu''-^ 

u = -r-, 

Y^oo J.9) k 
l^k=lt'k 



PO > 0, 



(57) 



(58) 



The above equation can be solved by the simple iteration 
procedure. To prove that it has a solution we rewrite it as 



^Pk^ku 



1-u 



fe-2 



k=l 

It is easy to check that 



I — u*^ 



= g{u) = Q. 



(59) 



(60) 



fe=i 



So for connected graphs g{l) is positive (z^^^ > 2) and 

g{0) negative (p^^^>0). 

Once we know u we can calculate h and po from the 
normalization of the distribution pk and Eq. (j23p : 



°° k ig) 



k=l 



fc=l 



(61) 




Because YlkLi " YlkLi = 1' ^^^o^e two equa- 

tions are not independent and we can set p^ — Q. Then, 



(62) 



C. Simulating connected graphs 

This procedure may be actually used to generate con- 
nected random graphs in an efficient way. Instead of 
generating connected graphs with degree distribution p^^^^ 
and checking the connectivity after every move, we can 
generate graphs with distribution pk given by (|57p and 
use the giant connected component. This still requires 
calculating the connected parts, but it need to be done 
only once before each measurement. 

As an example, we have generated connected maxi- 
mally random graphs with Poissonian degree distribution 



.z 



A: > 0, Po = 0, 




FIG. 11: Degree distribution p^"' in the connected giant com- 
ponent. Circles mark the results of MC simulation, while the 
solid line denotes the desired distribution 



with « 2.7236. For this distribution u w 0.1209, 
h « 0.0341, and z « 2.6696. Using the program 
we have simulated a maximally random graph with 
5000/(1 - /i) « 5177 vertices and 6910 links with de- 
gree distribution ([S7| . We generated 10 000 independent 
graphs. The average size of the giant component was 
5000.24 ± 0.25 with standard deviation « 20. The degree 
distribution in the connected component agrees very well 
with the desired one, as can be seen in Fig. [TTJ 



VI. UNCORRELATED CONNECTED GRAPHS 

An interesting situation arises when p\ — 0; i.e., ver- 
tices with degree 1 (leaves) are forbidden. Then m = and 
h—pQ. This means that the resulting graph consists of 
one giant connected component and poV isolated vertices 
only. It is easy to understand: finite connected compo- 
nents are trees, but there are no trees without leaves, 
except the degenerated ones made of a single vertex. If 
we additionally set po ~ then we will obtain a graph 
containing only the giant component — i.e., a connected 
graph. 

But as observed in Sec. IIII CI u — Q implies the ab- 
sence of correlations. That would support our argument 
made in the Introduction about the role of the one-degree 
vertices in the appearance of correlations in a connected 
graph. Using the results of the previous section we can 
state that vertex degrees in the maximal entropy random 
graphs are uncorrelated if and only if pi = 0; i.e., there 
are no leaves in the graph. 

As a check, we have carried out simulations with the 
exponential degree distribution and no leaves: 



Pk 



1 - e-i/'' 



^ fc>i, po = pi = o, (64) 



(63) 

for K = 1.5 (zw 3.055). The results for the giant compo- 
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nent which consisted on average of more the 99.9% of the 
whole graph are presented in Figs. [5] and in](squares). As 
predicted, vertices are uncorrelated in the stark contrast 
to the pi > case plotted in the same figures. 

We have also performed simulations for the scale-free 
distribution l/Zc^"^/^ and no leaves. The results are pre- 
sented in Figs.[7]and[5](squares). We see that correlations 
are very much suppressed compared to the case when we 
admit leaves (presented in the same figures). The slight 
remaining correlation is due to long tails as explained in 
Sec. UVCl 



VII. SUMMARY 

In this paper we have studied the correlations in con- 
nected random graphs. We have extended the results of 
Refs. d, i 

and calculated correlations in the giant 
connected components of random graphs. We argue that 
those correlations are related to the presence of nodes 
with degree 1, suggesting that the only cause of corre- 
lations is the absence of "hedgehogs." This has been al- 
ready stated in [11] where it has been shown that in the 
grand-canonical ensemble of arbitrary-sized trees, where 
"hedgehogs" appear, correlations vanish. We find this to 
be a very interesting issue that merits further studies. 

The correlations observed in connected random graphs 
are an example of the so-called "structural" or "kine- 
matic" correlations, as they appear in consequence of 
some global constraint. This should be contrasted with 
"dynamic" correlations which are the result of local two- 
point interactions between vertices. Such correlations 
may be generated by two-point weights f20'|. This dis- 
tinction can be important in simplicial quantum grav- 
ity where degree-degree correlations are interpreted as 
curvature-curvature correlations (see, for example, pH). 
However, as the simplicial manifolds are connected by 
definition those correlations are due to the above de- 
scribed mechanism rather than to some kind of gravi- 
tational interaction [ll], . We believe that our results 
may help in clarifying such issues and in the interpreta- 
tion of data obtained from MC simulations. 

Finally, we have shown how to relate the giant con- 
nected components to the maximal entropy connected 
graphs ensemble. This allowed us to propose an efficient 
method for generating connected random graphs based 
on the Metropolis algorithm. 



Appendix A: Non-self-averaging ensemble 

Denoting by Q{V; k) the ensemble of all simple regular 
graphs with V vertices and degree k (in a regular graph 
all vertices have the same degree), we define 



giy) = [jg{v;k), p(G) = ^^ 



ky 



(Al) 



where ^Q{V; k) denotes the number of graphs in the en- 
semble Q{V] k) and Wk is an arbitrary probability distri- 
bution. With this definition we find 



Geg k G&GiV-k) 



(A2) 



It is easy to note that this poorly describes the distribu- 
tions of single graphs which are just 5's. The variance of 
Pk is 



Wk {5k.q-WqY 

Geg k G<^g(V;k) ' ' 

= '^Wk{5k^q-WqY =Wq- 2Wg + W^^Wfe. 
k k 

(A3) 

and indeed does not disappear in the large-y limit. 
For correlations we obtain 

~Oir) ^ \ ='^^kSq,kSr,k = WqSq^r (A4) 



and 



^ E ^I'^'^^l E '^k'Sk'.r = WqWr- (A5) 
k k' 

So the condition ([TU| is not satisfied. It means that 
vertices on each particular graph are uncorrelated, but 
correlated if the whole ensemble is considered. This is 
easy to explain: if we pick a link from a graph with a 
given k. then the information about the first vertex does 
not provide any additional information; however, if we 
do not know k, then the degree of the first vertex will 
give us immediately the value of its neighbor. 
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Let Q and P{G) define a maximal entropy ensemble 
with V vertices, L links, and vertex degree distribution 
Pk- We assume that the probability P{G) factorizes: 



p{G) = n Pcic), 



(Bl) 



ceG 
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where C are the connected components of the graph G. 

Let Qc denote the ensemble of all giant connected com- 
ponents. We assume that we can neglect the fluctuations, 
so all the graphs in this ensemble have T^^^-* vertices and 
links. The de gree distribution in this ensemble is 
p^^''. Because of the property (jBip . the entropy ([T]) of 
the whole ensemble {Q, P) is the sum of the entropy of 
the giant connected component ensemble and the rest: 

5' = 5'(s)+S'(^^. (B2) 

Now we assume that there exists a probability P'^ defined 
on the ensemble Qc such that the entropy 

- P'AG)\nPc{G) (B3) 



is greater than 5*^^^ but the vertex degree probability 
distribution remains unchanged. Then we can define a 
new probability on the ensemble Q: 

P'{G)=P'AC'^^^) n Pc{C), (B4) 

where C^^-' is the giant connected component of graph G. 
The degree distribution of the ensemble {Q,P') would be 
the same as that of {Q, P) ensemble, but according to 
(|B2p . its entropy would be greater. This contradicts the 
assumption that (Q, P) is the maximal entropy ensemble 
and proves that the ensemble of giant connected compo- 
nents is a maximal entropy ensemble. 
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